Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Network
نویسندگان
چکیده
We present an end-to-end, multimodal, fully convolutional network for extracting semantic structures from document images. We consider document semantic structure extraction as a pixel-wise segmentation task, and propose a unified model that classifies pixels based not only on their visual appearance, as in the traditional page segmentation task, but also on the content of underlying text. Moreover, we propose an efficient synthetic document generation process that we use to generate pretraining data for our network. Once the network is trained on a large set of synthetic documents, we fine-tune the network on unlabeled real documents using a semi-supervised approach. We systematically study the optimum network architecture and show that both our multimodal approach and the synthetic data pretraining significantly boost the performance.
منابع مشابه
Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Networks SUPPLEMENTAL MATERIAL
We introduced two methods to generate documents. In the first method, we generate LaTeX source files in which elements like paragraphs, figures, tables, captions, section headings and lists are randomly arranged using the “textblock” environment from the “textpos” package. Compiling these LaTeX files gives single, double, or triplecolumn PDFs. The generation process is summarized in Algorithm 1.
متن کاملA Two-Dimensional Convolutional Neural Network for Brain Tumor Detection From MRI
Aims: Cancerous brain tumors are among the most dangerous diseases that lower the quality of life of people for many years. Their detection in the early stages paves the way for the proper treatment. The present study aimed to present a two-dimensional Convolutional Neural Network (CNN) for detecting brain tumors under Magnetic Resonance Imaging (MRI) using the deep learning method. Methods & ...
متن کاملIntegration of Deep Learning Algorithms and Bilateral Filters with the Purpose of Building Extraction from Mono Optical Aerial Imagery
The problem of extracting the building from mono optical aerial imagery with high spatial resolution is always considered as an important challenge to prepare the maps. The goal of the current research is to take advantage of the semantic segmentation of mono optical aerial imagery to extract the building which is realized based on the combination of deep convolutional neural networks (DCNN) an...
متن کاملEMG-based wrist gesture recognition using a convolutional neural network
Background: Deep learning has revolutionized artificial intelligence and has transformed many fields. It allows processing high-dimensional data (such as signals or images) without the need for feature engineering. The aim of this research is to develop a deep learning-based system to decode motor intent from electromyogram (EMG) signals. Methods: A myoelectric system based on convolutional ne...
متن کاملA Convolutional Neural Network based on Adaptive Pooling for Classification of Noisy Images
Convolutional neural network is one of the effective methods for classifying images that performs learning using convolutional, pooling and fully-connected layers. All kinds of noise disrupt the operation of this network. Noise images reduce classification accuracy and increase convolutional neural network training time. Noise is an unwanted signal that destroys the original signal. Noise chang...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1706.02337 شماره
صفحات -
تاریخ انتشار 2017